Cardiovascular disease detection from cardiac arrhythmia ECG signals using artificial intelligence models with hyperparameters tuning methodologies

Cardiovascular disease (CVD) is connected with irregular cardiac electrical activity, which can be seen in ECG alterations. Due to its convenience and non-invasive aspect, the ECG is routinely exploited to identify different arrhythmias and automatic ECG recognition is needed immediately. In this paper, enhancement for the detection of CVDs such as Ventricular Tachycardia (VT), Premature Ventricular Contraction (PVC) and ST Change (ST) arrhythmia using different dimensionality reduction techniques and multiple classifiers are presented. Three-dimensionality reduction methods, such as Local Linear Embedding (LLE), Diffusion Maps (DM), and Laplacian Eigen (LE), are employed. The dimensionally reduced ECG samples are further feature selected with Cuckoo Search (CS) and Harmonic Search Optimization (HSO) algorithms. A publicly available MIT-BIH (Physionet) - VT database, PVC database, ST Change database and NSR database were used in this work. The cardiac vascular disturbances are classified by using seven classifiers such as Gaussian Mixture Model (GMM), Expectation Maximization (EM), Non-linear Regression (NLR), Logistic Regression (LR), Bayesian Linear Discriminant Analysis (BDLC), Detrended Fluctuation Analysis (Detrended FA), and Firefly. For different classes, the average overall accuracy of the classification techniques is 55.65 % when without CS and HSO feature selection, 64.36 % when CS feature selection is used, and 75.39 % when HSO feature selection is used. Also, to improve the performance of classifiers, the hyperparameters of four classifiers (GMM, EM, BDLC and Firefly) are tuned with the Adam and Grid Search Optimization (GSO) approaches. The average accuracy of classification for the CS feature-based classifiers that used GSO and Adam hyperparameter tuning was 79.92 % and 85.78 %, respectively. The average accuracy of classification for the HSO feature-based classifiers that used GSO and Adam hyperparameter tuning was 86.87 % and 93.77 %, respectively. The performance of the classifier is analyzed based on the accuracy parameter for both with and without feature selection methods and with hyperparameter tuning techniques. In the case of ST vs. NSR, a higher accuracy of 98.92 % is achieved for the LLE dimensionality reduction with HSO feature selection for the GMM classifier with Adam's hyperparameter tuning approach. The GMM classifier with the Adam hyperparameter tuning approach with 98.92 % accuracy in detecting ST vs. NSR cardiac disease is outperforming all other classifiers and methodologies.


Introduction
A rapid, abnormal cardiac rhythm is known as VT.VT is characterized as three or more consecutive heartbeats with more than 100 heartbeats per minute.If VT lasts for more than a few seconds, it can be fatal.Due to fast heart beating, cardiac circulation loses the synchronization that beats to ventricular fibrillation [1].Symptoms of VT include cardiac arrest, chest pain, and breathing shortness.PVC is one of the ventricular arrhythmia.It is an irregular cardiac rhythm.Symptoms of PVC include fluttering and skipped beats.ST Segment represents the duration between ventricular depolarization and ventricular repolarization.Myocardial infarction is the most common cause of ST change abnormalities.These ST Change abnormalities are also known as depression or elevation [1].In the last three decades, the scientific community has introduced a punch of algorithms to detect cardiac arrhythmias such as VT, PVC and ST from ECG signals.
The electrocardiogram (ECG) is an investigative instrument that monitors and records the electrical activities of the human heart [2].ECG is useful for identifying the source of chest pain and for detecting irregular heart rhythms or cardiac irregularities.Usually, healthy hearts undergo cardiac ECG.Any heart rhythm irregularity can alter the shape of the ECG Signal [2].It is based on a standard 12 lead system, which tests the electrical potential of the 10 electrodes placed on various parts of the body surface, six in the chest and four in the limbs.An early diagnosis is necessary in order to provide efficient care of arrhythmias [3].There are three wave in each cardiac cycle, including the P wave, QRS complex and T wave [4].ECG arrhythmia detection is an important part of the identification of different cardiac illnesses.Effective and precise ECG arrhythmia diagnosis allows doctors to diagnose various heart disorders.The detection of arrhythmia using ECG is very difficult.This is due to the variability in the typical ECG waveform of each individual, the dissimilar signs for one disease happening different electrocardiogram waveform patients, two dissimilar illnesses ought to roughly similar effects happening different electrocardiogram waveform patients, inconsistency of ECG Characteristics and complete absence of effective detection algorithm for beat of ECG classification [5].
Several detection techniques for cardiovascular diseases have been mostly presented in recent years.Most of these methods are made up of four steps: preprocessing (de-noising), dimensionality reduction, feature selection, and identifying different cardiac arrhythmias.Discriminant analysis was used to extract the ventricular fibrillation with the help of MIT-BIH ECG signals by Irena et al. [6], and the best detection output values were achieved with an average sensitivity of 94.1 % and an average specificity of 93.8 %.A Support Vector Machine (SVM) with 14 metrics was proposed by Qiao et al. [7] for the detection of ventricular fibrillation and ventricular tachycardia and found that the average detection accuracy was 95 % only.A Signal Comparison Algorithm (SCA) approach to the detection of VT based on publicly available annotated datasets was done by Tratning et al. and found that the average detection accuracy was 96.2 %, sensitivity of 71.2 % and Specificity of 98.5 % [8].Shweta et al. used a hybrid of Particle Swarm Optimization (PSO) and Feed Forward Neural Network (FFNN) classifiers for ECG beat detection and found an overall detection accuracy of 97 % [2].The SVM, Adaboost, ANN and Naïve Bayes classifiers for ECG signals classification were done by Celin et al. and the naïve Bayers result achieved a high accuracy of 99.7 % compared with the SVM, Adaboost and ANN Classifiers [9].A Genetic Algorithm and Kernel Extreme Learning Machine (KELM) were used to detect the arrhythmias with the help of ECG signals by Dikera et al. [10], and the best detection output values were achieved an accuracy of 95 %, sensitivity of 100 % and Specificity of 80 %.For the automated diagnosis of heart diseases using MIT-BIH ECG Signals, fast compression residual convolutional neural networks (FCResNN) were proposed by Jing et al. and they achieved an accuracy of 98.79 % [11].Discrete Wavelet Transform (DWT) and Principal Component Analysis (PCA) were used to extract the ECG signal features and classify five classes of cardiac arrhythmias using the SVM-RBF classifier with 10-fold cross-validation by Martis et al. [12].The average classification accuracy was reached 96.92 %.Fuzzy Hybrid neural network with Higher Order Spectra features (HOS) classifiers for seven classes of ECG beat Recognition was done by Trans et al. [13] and found that the overall recognition accuracy was 96.06 %.Discrete Wavelet Transform (DWT) with Neural Network classifier was used to classify the four classes of cardiac abnormalities through 10-fold cross-validation by Sukanta et al. [14] and found that the average classification accuracy was 96.67 %.
Hjorth Descriptor with Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) classifiers for three classes of ECG signal classification were done by Rizal et al. [15], and the best detection output value achieved an average accuracy of 93.3 % for 10 fold cross-validation.A Bispectrum PCA with SVM-RBF was suggested by Martis et al. [16] for the detection of five classes of ECG abnormalities and found that the average detection accuracy was 93.48 % using 10-fold cross-validation.The higher order cumulative with PCA and Neural Network (NN) was used to classify the five classes of cardiac abnormalities Martis et al. [17], and the average overall classification accuracy achieved 94.52 % using 10-fold cross-validation.Nazmy et al. [18] proposed an ICA, Power Spectrum with FFNN, FIS and ANFIS classifiers to classify six types of ECG abnormalities.The best detection output values were achieved with an accuracy of 97.1 % (ANFIS).Autoregressive modelling with the GLM algorithm was used to classify the six classes of ECG Signals by Dingfei et al. [19], and the average overall classification accuracy achieved 93.2 %.
A classifier called particle swarm optimization with chi-square distance for arrhythmia classification was proposed by Dhiah et al. [20], achieving the best detection accuracy of 98 %.The Pan-Tompkins algorithm, as well as features based on time-domain HRV, were utilized by Masud et al. [21] to extract short-term atrial fibrillation signal characteristics, which were classified using an Adaboost classifier involving 5-fold cross-validation, this resulted in average classification accuracy levels reaching 91 %.Shikha et al. [22] applied the three-dimensional discrete wavelet transform (3D DWT) method on ECG abnormalities while employing a support vector machine (SVM) as a classifier, achieving an average classification accuracy of 99 %.FIR filtering, together with the KNN classifier, was proposed for classifying the ECG family by Alba et al. [23], attaining the highest level of detection accuracy equal to 89 %.A deep neural network model with residual blocks was presented by Mohamed et al. [24] for the detection of six classes of ECG abnormalities, and the average detection accuracy found was 99.51 %.Manas et al. [25] proposed a scalar invariant transform with deep neural G.S. Manivannan et al. network classifiers with 5-fold cross-validation to classify three ECG abnormalities that had recorded high detection accuracies, reaching 99.78 %.
Due to their usage of the highest dimensionality, irrelevant characteristics, missing data, and redundancy, the aforementioned machine learning-based algorithms have demonstrated a considerable increase in their ability to diagnose cardiovascular diseases accurately.Consequently, a machine learning-based system capable of effectively detecting individuals with cardiac diseases must be developed.Also, none of the abovementioned research that has been done so far has used hyperparameter modification to improve the accuracy of cardiovascular disease diagnosis.Tuning the hyperparameters of a machine learning classifier is a more effective method for improving its performance.Data analysts configure hyperparameters before the learning procedure, which is independent.After trying out a few different hyperparameter values, the results are compared so that the best solution can be found.The method of tuning hyperparameters is mostly based on experimental outcomes rather than theoretical results [26].People expect high-quality treatment and services in the medical field [27].Therefore, the main goal of the study is to use the GSO and Adam approaches to make the GMM, EM, BDLC, and Firefly classification algorithms more effective.Following the execution of the GSO and Adam approaches, it will be possible to choose the optimum values for the classification algorithm criteria.By utilizing these optimized hyperparameters, the method for identifying cardiovascular disease can be made to work better.The following are summaries of the study contributions.The organization of work is as follows.In section 2, materials and methods are described.In section 3, the use of the dimensionality reduction technique is discussed, section 4 deals with feature selection optimization methods, and section 5 explains how to use classifiers for classification.In contrast, sections 6, 7, 8, and 9 provide training and testing, hyperparameters tuning methodologies, results and discussion, and conclusion.

Materials and methods
The ECG raw signal database is drawn from MIT-BIH (Physionet) different cardiac class databases.In this work, four different databases are utilized.The four different databases are the VT database, PVC database, ST Change database and NSR database.360 Hz samples are digitized for recording per channel per second with an11 bit resolution of 10 mV [28].In this work, we utilized 74 subjects with 148 recordings.The MIT-BIH Ventricular Tachycardia (VT) database consists of 12 subjects, and each has been with two records (ML I, V1) for a total of 24 recordings.The MIT-BIH premature ventricular Contraction (PVC) database consists of 16 subjects, and each has two records (ML I, V1, V4 or V5) for a total of 32 recordings.The MIT-BIH ST Change (ST) database consists of 28 subjects, and each has been with two records (ECG1, ECG2), for a total of 56 recordings.The MIT-BIH Normal Sinus Rhythm (NSR) database consists of 18 subjects, and each has two records (ECG1, ECG2), for a total of 36 recordings.Therefore, these subjects and recordings have enough Normal VT, PVC and ST Arrhythmia beats for the work.The sampling frequency of the given VT, PVC and ST ECG signal is 360 HZ, and the Normal ECG signal is 128 HZ.The details of the MIT-BIH database of our work are shown in Table 1.The dimensionality of ECG data is quite large and occupies a larger memory space.The fundamental goal of dimensionality reduction techniques is to convert a high-dimensional data space into a low-dimensional data space.The dimensionality of the reduced form should equal that of the original data's inherent dimensionality.Dimensionality reductions are significant in many applications since they reduce undesirable characteristics and the curse of dimensionality [29].For this work, four distinct diseases were used to construct classification issues such as VT vs. NSR, PVC vs. NSR, and ST vs. NSR.The overall methodology for automated detection of different cardiac arrhythmias is shown in Fig. 1.
In this work, for the purpose of dimensionality reduction and feature selection, the cardiac arrhythmia samples are is divided into epochs.VT signal is divided into 43333 epochs, each with 360 samples; PVC signal consists of 57778 epochs each having 360 samples, ST Change signals divided into 84000 epochs each with 360 samples and NSR signal consists of 141750epochs each having 128

Local Linear Embedding (LLE)
It is one local non-linear technique of dimensionality reduction.LLE preserves the local properties of data and the global layout of the data.LLE is a less sensitive one.The data manifold's local properties are built through the data points also as a combination of linear for the k nearest [30].In the LLE method, there are three essential steps.A neighbourhood is constructed for each point of data, and the weights of estimating data in such a linear way.So in this neighbourhood are calculated.Finally, the weights that aid in the most accurate reconstruction of low-dimensional coordinates have been discovered.To a dxj data matrix W, the inputs to the LLE algorithms are provided [31].Fig. 2 represents the Cumulative Distribution Function (CDF) plot evaluation of LLE features for VT, PVC, ST and Normal cases.As shown in Fig. 2 that the features among the all the classes are overlapped, non-Gaussian and nonlinear.Hence, it is advised to use good classifiers for better results.i.For z → i , the j nearest neighbours was discovered.ii.z → i from its neighbours iii.As a result, minimizing the cost function is equivalent to finding the dimensional data representation Q.
Where ∑ k q lk for each I and Q T Q = K.M represents the nearest neighbours.

Diffusion maps (DM)
It is one of the global non-linear techniques of dimensionality reduction.Diffusion Maps were prepared by constructing a Markov random stroll on the data graph.A calculation of the data point's proximity is obtained by performing the random stroll for such a number of time steps.This so, diffusion distance is calculated using this method.The pair-wise diffusion distances are preserved as much as feasible in a low-dimensional set of data [30].The first step is to construct a data graph.The Gaussian kernel function is used G.S. Manivannan et al. to calculate the weight of graph edges, resulting in a matrix.
Where σ indicates the Gaussian variance, after that, the matrix a is normalized such that all rows stack up to one.So, the matrix b (1) is, b (1)  kl = a kl ∑ g a kg (4) Although diffusion maps derived through dynamical system theory, that resulting matrix b (1) is a markov matrix.The markov matrix describes a dynamical process forward transition probability matrix.b (1) is transition of one point to another point of data.So, probability matrix is b (t) is given by ) t .It is also defined as the diffusion distance.
Ψ ( y g   (7) Where λ = 1 and its Eigen vector m 1, low dimensional representation 'S' is given by, Where h represents the principal Eigenvectors, Fig. 3 illustrates the histogram of the different evaluations of diffusion map features for VT, PVC, ST, and normal cases.It is observed from Fig. 3 that the histogram for VT cases is non-Gaussian and skewed when compared to normal cases.The overlapping nature of the histogram variables is also clearly indicated in Fig. 3.

Laplacian Eigenmaps (LE)
It will be more relevant to the locally linear embedding technique in that the conservation of the local properties of the manifold is prioritized, allowing Eigenmaps to locate the lower-dimensional datasets conveniently.The pairwise distances between neighbours are used to evaluate local properties.They typically simulate a low-dimensional approximation of a specific dataset where the distances between datasets are diminished [32].The LE algorithm begins by building its neighbourhood data matrix 'S' where each data points g n are linked toward its 'm' nearest neighbours.The weight of an edge is evaluated through using Gaussian kernel function for all points g n and g t in graph 'S' that are connected by an edge, where, σ is the Gaussian variance and resulting in A, a sparse adjacency matrix.The cost function that is reduced in the computation of its low dimensional projections f n is [30], (9) Shorter gaps in between data points g n and g t with huge weights a nt in the cost function.As a result, the cost function is heavily influenced by the distance between their low dimensional representations f n and f t .It is possible to formulate the minimization problem as an Eigen problem by computing the degree matrix 'P' and Laplacian graph 'E' of graph A. That row amount of A, That is (10) E = P − A is used to measure the Laplacian graph E. Therefore, equation (9) rewritten as, As a result, minimization of ∅(F) becomes proportional to minimization of F ʹ EF.So, it is solved the eigen vector problems.The lowdimensional data representation 'F' is formed by the 'l' eigen vectors x i , which correspond to the smallest non zero eigenvalues.important.The statistical parameters such as Mean (μ), Standard Deviation (σ), Variance (σ 2 ), Skewness (skew), Kurtosis (C), Pearson Correlation Coefficient (PCC), Approximate Entropy (ApEn), Renyi Entropy (ReEn) and Permutation Entropy (PeEn) along with different dimensionality reduction techniques for VT, PVC, ST and Normal Cases are presented in Table 2.
The Normal plot curve is simulated as a non-linear with overlapping, as illustrated in Fig. 4. The CDF plot curve is simulated as unshaped, illustrated in Fig. 2. Table 2 summarizes the average statistical parameters for VT, PVC, ST and Normal Cases at various dimensionality reductions.In this, all statistical parameters and entropies between VT, PVC, ST, and Normal cases overlap, and there is the existence of non-linearity, as indicated by greater values of kurtosis and variance.These estimated parameters imply that further processing of the features will require a feature selection approach.

Feature selection
The dimensionally reduced ECG epoch values are fed into the feature selection using the Cuckoo Search algorithm (CS) and Harmonic Search Optimization (HSO) for selecting the features.

Cuckoo Search (CS)
Several Features in the ECG datasets can deceive the classifiers' prediction abilities since certain features can lead to incorrect classification.In order to enhance the classification accuracy, feature selection methods may be employed to choose informative features.The Cuckoo-Search is a well-known metaheuristic and nature-inspired optimization algorithm.CS algorithm is simple, efficient, and suitable for determining the search of arbitrary paths.It can be used to solve any engineering design problems.Cuckoos set their eggs in the nests of other host birds of different species.Unless the host bird discovers that the eggs are not its own, it will either destroy the egg or abandon the nest entirely.Cuckoo eggs that look like host bird eggs have advanced as a result of this.The three specific terms of the CS algorithm can be outlined as follows [33].a.Each Cuckoo bird lays one egg at a time and deposits those in a collection that is selected at random.b.The best nests and the best eggs can be passed on to future generations.c.The set of possible host nests is defined, and the host bird discovers the cuckoo's egg with the probability of f a ϵ[0, 1].
The basis CS algorithm is based on the above rules.Levy flight corresponds to the arbitrary flight characteristics of birds, and it is used to determine its next position f (t+1) i , using the existing position f t i as a starting point.
Where γ > 0 denotes the step size of scaling factor.γ denotes step size and ⊕ denotes entry-wise multiplication.Here γ = 1 and β = 0.2 are arrived by trial and error method based on MSE values, it is indicating infinite variance with infinite mean.

Harmonic search optimization (HSO)
An unbounded optimal solution is written as follows when dealing with harmonic search optimization (HSO) via clustering [34].
min e(Y); Py j ≤ y j ≤ Wy j (15) Where e(Y) represents the function of object, Y represents the decision variable, y j is the j th variable of decision.Py j and Wy j represents the lower and upper bounds of the j th variable of decision.To solve the typical harmonic search problem, following 5 steps mentioned below.
Step 1 The problem algorithms parameters are set their default values in this step the problem parameters n, Py j , Wy j are initialized.
The Stopping Criterion (SC), Pitch Adjusting Rate (PAR), Harmonic Size (HS) and Harmonic Memory Considering Rate (HCR), or the highest number of improvisations (V max ), are the four additional algorithm parameters that are initialized.
Step 2 The memory for harmonic is set up.Genetic algorithms and Harmonic Search are similar.It's a population-based optimization algorithm in GA, but in HS, the population is called Harmonic Memory (HM), and it's built as a solution vector.The following is Harmonic Memory (HM) representation [35].
Where y j i denotes the i th and j th of the decision and solution vector.r denotes the r ∈ [0, 1].
Step 3 There is now a new harmonic; the three principles of pitch adjustment, harmonic memory consideration, and randomization are used to improve a novel harmonic vector.
The PAR determines the probability of pitch change, while the harmonic memory consideration rate determines the probability of harmonic consideration.
Where, y new j represents the j th variable y new j of normal harmonic vector, bw denotes the bandwidth.
Step 4 The HM has been refreshed.If the new and updated harmonic vector outperforms the worst harmonic vector in terms of objective function value, the improved harmonic vector effectively triumphs.
Step 5 The condition for stopping has been meticulously checked.If the stopping condition is met, the iteration is finished; if not, step 3 and 4 are continued.
Table 3 displays the average statistical parameters using different dimensionality reduction techniques with CS and HSO feature selection for VT, PVC, ST, and Normal cases.Table 3 shows that after CS and HSO feature selection, overlapping features are eliminated.Despite the existence of non-linearity, the improved feature is still used to provide superior segmentation by a select group of classifiers.

Classifiers for different cardiac arrhythmias detection
Dimensionally reduced ECG epoch values and epoch values from CS and HSO feature selection are fed into classifiers for detecting different cardiac arrhythmias.The classifiers used include Gaussian Mixture Model (GMM), Expectation Maximization (EM), Nonlinear Regression (NLR), Logistic Regression (LR), Bayesian Linear Discriminant Analysis (BDLC), Detrended Fluctuation Analysis (Detrended FA) and Firefly.

Gaussian Mixture Model (GMM)
A Gaussian mixture model represents a probability density function (PDF) of its random variable, ∑ s g .Where 'n' is Gaussian distributions given by, Where q indicates the data vector, r represents the mixture model, Where μ k represents the mean and co-variance of matrix, ∑ k indicates the mixture weights, it is satisfy the condition that ∑ n k=1 β k = 1.GMM considerations are most often obtained from the training data and to use the Expectation maximization (EM) iterative algorithm, tough map estimation is used sometimes.The resulting GMM is validated using the mixture weights, covariance matrices and mean vectors from many of the parameter densities.The following terminology is used to represent the parameter collectively [36].
Covariance matrices ∑ k can also be a complete class otherwise restricted such that they are diagonal.The model configuration in GMM is determined by the total quantity of data available to estimate their GMM parameters.It is important to note that even though the features aren't statistically independent, complete covariance matrices aren't needed so because Gaussian components aren't explicitly working together to simulate the feature density.The successful presence of linear combination with diagonal covariance premise Gaussian is used to model the association between the function vector components.The sequence of n complete covariance matrix Gaussian can be easily obtained by using a greater set of diagonal covariance Gaussian.Maximum probability parameter estimate is used for the estimation.If the vector is considered to be different, its GMM probability is described as follows for a given sequence of G training vectors.
The above equations show that 'r' is a non-linear parameter and maximization (direct) is not possible.However, the optimal case of the expectation maximization (EM) algorithm can be used.The maximization likelihood parameter can be easily obtained iteratively.The EM algorithm most basic approach is to take via an original model 'r' but instead estimate the new model r, so that The weight of mixture is given by The diagonal covariance's are, σ 2 i , q g and μ k -it is refers to arbitrary vector elements.Finally V s ( k|q g , r is given by following equation,

Expectation maximization (EM)
Expectation maximization (EM) is a mathematical method for optimizing dynamic likelihoods and solving problems with missing results.In general, (i) Expectation step (E) and (ii) Maximization step (M) are two steps of the EM algorithm [37].

(i) Expectation
Step (E): Define data g 1 , which contains a parameters approximation and observed data; that estimated value can be quickly calculated initially.The estimated value of g 1 is calculated as follows for a given measurement s 1 and looking at the current approximation of its variable.
(ii) Maximization Step (E): We will use data that has been actually determined to calculate the parameter's maximum likelihood approximation after the expectation stage.An collection of unit vectors is described as G.S. Manivannan et al.
The above equation is likelihood can also be given as (36) We'll need to use the Lagrange operator 'v' to optimize the equation to get the likelihood parameters & k.The modified equation can then be written as follows, The parameter constraints are obtained by deriving the above equation with respect to μ, v & k and equating these to 0. Therefore, Since both the observed data and the present approximation of its model parameters were given, threshold data will be first calculated within the expectation stage.To achieve this, the conditional expectation has been used, which illustrates the terminology preference the likelihood function was maximized in the Maximization-step values obtained underneath the premise that even the threshold statistics were known.In place of its real threshold statistics, the expectation steps for the calculation of the missing data are used.

Non linear regression (NLR)
Non-linear processes are sometimes required to identify real-world phenomena where linear models are insufficient.All such regression models would have some basic framework, that is, s = g(a) (41) Non-linear regression (NLR) may provide a smoother line unless the variable ʹs ʹ is random, while linear regression (LR) may equate any two parameters with such a straight line in the pattern of s = ja + d.So, the main target of NLR would be to reduce the number of its squares, which represents the degree to which an individual findings vary from the mean of dataset's [38].Therefore, the NLR function ʹs ʹ is defined as follows [39], It connects a set of independent variable (a) to the observed dependent variable (s) and in component of its vector parameters, the function 'g' is non-linear, but even then it is arbitrary.For example, to achieve error free classification outcomes, the following NLR model is represented by non-linear function of ʹγ ʹ .
The model of non-linear regression (NLR) is written as follows, Where 'g' denotes the function of expectation and a m denotes the independent variable.Each of the variants of the expectation function 'g' for non-linear models should be dependent at least one of the variables.In a non-linear model, γ is used as the parameters.J is the number of parameters that are being considered.Where analysing a set of data, consider vector a m , where m = 1, 2, 3, …M and it is fixed to concentrate the expected response's dependence on.The m th element of the M-vector λ(γ) is now formed.
The non-linear regression model is mathematically written as follows: Where p denotes the spherical normal distribution, it is written as follows, G.S. Manivannan et al.
In geometrical manner, the least squares values can be easily sought.m = 1, ……M for a given data vector 's', an expectation function g(a m , γ) and set of design vector a m .

Logistic Regression (LR)
Logistic Regression (LR) is among the most widely used classifiers.To approximate the value of a statistical variable 'g' in the future when g ∈ [0, 1], is '0' means negative class and '1' means positive class for a binary classification problem [35].The single outcome vector, g m (m = 1, 2, 3, …..n), is coded '1' for a specific probability s m and '0' for a specific probability 1 − s m .The s m differs in such a statistical context, along with f m , as a function of certain parameters and has been represented as, Where γ is a parameters of vectors, with the implication that f m0 = 1.As a result, the logistic transformation is defined as the logarithm of the positive outcome odds, and it is represented as follows [40], The logistic function is written as follows in matrix form: the standardized log-likelihood and the loss function negative likelihood are calculated using the following formulas, The loss function also known as the deviance (DEV).The regularization expression α 2 ‖γ‖ 2 is applied in order to achieve a greater generalization.

Bayesian discriminant linear Classifier (BDLC)
The Bayes determination rule is fully reliant on the BDLC classifier to decrease the probability error.In a function vector 'g'; the class with the highest posterior probability is picked otherwise, if there are two classes 'm' and 'n', to pick class ʹm ʹ if [41], With 'M' as the deciding threshold, the discriminant function s m (g) is defined as, Any class observation is derived solely from the multivariate normal distribution.As a result of the Bayes principle, the covariance matrix for all categories is equivalent, and the discriminant function is given as follows, Where, μ m denotes the mean function vector for a given class 'm', ∈ denotes the matrix of covariance and J(m) denotes its prior probability for class 'm' [42].The deciding boundary M is defined as follows, if the prior probabilities of all categories are assumed to be static.The separability of classes increases as the factor of 1

Detrended Fluctuation Analysis (detrended FA)
The Detrended FA, which is equivalent to the Hurst exponent study, is the result of a development in conventional fluctuation analysis correlation properties that can be calculated on a significant time scale basis in this case [43].The random walk principle very significant in Detrended FA, a time series Q with mean of 'g' description is defined as follows [44], G.S. Manivannan et al.
The profile then divided into Q u = (Q /u) non overlapping segments, each with an equivalent of u is length of scale.The mean squared fluctuation function is expressed as follows for the Detrended FA approach.
Least square fitting is used to approximate a piecewise polynomial regression x (y) u (p) within each section j.Now we'll look at the profile element, which has been detrended on a particular scale u.The fluctuation function on a particular scale u is now given by the variance of Gu (p) expressed as follows, The trend-eliminated root mean square displacement is represented by equation ( 59), which must be determined with various scales of u.

Firefly
The firefly algorithm, which was invented by Yang [44], is used to simulate the flashing phenomenon of fireflies.The following hypotheses are taken in order to simulate the definition.The fireflies are the entire same genus.Fireflies that really are brighter draw more attention than fireflies that are less vivid.The attraction between the fireflies reduces as the distance between them rises.Since no firefly is sharper than another, the firefly would travel at random.The landscapes of a given objective feature heavily influence a firefly's light.The brightness 'k' is represented as,

k(x)αE(x)
(61) Where k 0 indicates the light of actual density and β represents the coefficient of absorption Attractiveness α as follows, Where p represents the gap between two fireflies, α 0 indicates the attractiveness at condition p = 0. Separation between p mn of two fireflies is z m and z n is estimated as Where z m,j represents the jth component of coordinate spatial of z m .Its exact behaviour of the firefly (m) attracted to an even much brighter firefly (n), it is determined as follows ) Where r indicates the random parameter [0, 1] and rand represents the Gaussian distribution [0, 1].

Training and testing
Dimensionally reduced ECG epoch values and epoch values from CS and HSO feature selection are used separately for both training and testing.This same training was developed using such a regressive approach, and the classifier MSE values were condensed to the least one.All of the classifiers were trained with an MSE of zero training error.The kinds of cross-validation approach used in this work were K fold.The dataset is primarily segmented into K equal-sized points.For the training of the classifiers, K − 1 sets are used for performance evaluation in each step, and then the remaining step is used.The validation cycle is continued for a total of K times.The performance of the classifier calculation is evaluated using the K results.The value of K in our work is set to 10.As a result, 90 % of the epochs were utilized for training, while just 10 % was used for testing.In this work, epochs of one class are equally distributed across the folds.Based on the low MSE values attained in the CS feature selection methods for different classifiers, which will be a marker for good classifier performance.

Mean Square Error (MSE)
The sum of its squared errors, or the average squared variance between both the predicted and real value, is calculated by Mean Square Error (MSE).Because of randomness, MSE is almost always purely positive rather than zero.The monitoring of the MSE is used to observe the training and testing process [45], Where G k indicates the value of observed at particular time, S l represents the value of target at l.

Hyperparameter tuning approaches for enhancement of classifiers
In order to build a machine learning algorithm with excellent performance, one of the most important steps is to tune the classification model's hyperparameters.In this paper, Grid Search Optimization (GSO) and the Adam technique are used to tune the hyperparameters of several classification algorithms.

Grid Search Optimization (GSO) approach
In several machine learning strategies, GSO is employed to find the ideal parameters.Cross-validation is taken into consideration in order to direct the outcome metrics [46].A grid search is an exhaustive search that may be put to use in the process of computing the ideal values for various hyperparameters [47].It can develop a concept that produces every possible set of parameters and then record each of those combinations.This approach can save time and resources.Once these parameters are tuned, several classifier approaches are obtained [48].By tweaking hyperparameters, GSO delivers the finest possible solution.Fig. 5 presents the flowchart of the GSO hyperparameter methodology for the firefly classification algorithm.Algorithm 1 in the appendix outlines the strategy for optimizing firefly hyperparameters employing the GSO approach.mk and mk − 1 denotes the present and past iterations of the GSO optimizer.
In this work, the firefly classification model hyperparameters are γ, α min and rand.By tuning the hyperparameters γ, α min and rand, the effectiveness of the firefly approach can be optimized.The firefly hyperparameter is initialized with a random value within the range [0, 1] using rand int .The optimal results for firefly maximum iteration (maxiter) and GSO maximum iteration (maxiter) are proved to be 1000 and 500, accordingly.As a result, the firefly classifier's population size (n) is 40, which is taken into account in this study.The hyperparameter values that are most strongly associated with the lowest possible error percentage are identified via this iterative process and recognized as the classiest hyperparameters.The GSO hyperparameter optimization process will also be used with the GMM, EM, and BDLC classifiers to optimize hyperparameters similarly.

Adaptive moment estimation (Adam) approach
Stochastic optimization is an essential part of both deep neural networks and machine learning approaches, and Adam provides a way to carry out this process [49].The Adam method is easy to develop, fast, and memory-friendly, making it ideal for instances where vast datasets and factors are involved.The adaptive gradients and RMS propagation methods of stochastic gradient descent are both included in the Adam procedure [50].This optimization technique employs a randomly chosen data segment to build a stochastic approximation, as opposed to utilizing the complete dataset to compute the original gradient.This enables the system to provide a      more accurate result.Adam exploits exponential moving and squared gradient approximations.The following expressions evaluate hyperparameters [51]: Where Dt represents the first moment estimation, Kt represents the second moment estimation, g t represents the ancient hyperparameters, g t+1 represents the tuned hyperparameters, l represents the learning rate of the gradients, ∈, R 1 and R 2 represents the constants and ∂l ∂gt represents the loss function of the gradient to be curtailed at g.As a result, the loss function of the gradient is written mathematically as follows: Fig. 6.Flowchart of the Adam hyperparameters methodology for the GMM classification algorithm.Where pq and pq − 1 denote the present and past iterations of the Adam optimizer and e represent the error rate of the model.The flowchart of the Adam hyperparameter methodology for the GMM classification algorithm is shown in Fig. 6.Algorithm 2 in the appendix outlines the strategy for optimizing GMM hyperparameters employing the Adam approach.The rate of errors is a loss function that has to be reduced as much as possible.In the GMM model, the hyperparameters β k , μ k and ∑ k will be used rather than the hyperparameter g, which was employed in the equations shown previously.In this study, the values l = 0.0009, R 1 = 0.74, R 2 = 0.82, and ∈= 10 − 7 have been assigned to the Adam constants.The optimal results for GMM maximum iteration (maxiter) and Adam maximum iteration (maxiter) are proved to be 750 and 300, accordingly.The hyperparameter values that are most strongly associated with the lowest possible error percentage are identified via this iterative process and recognized as sophisticated hyperparameters.The Adam hyperparameter optimization process will also be used with the EM, BDLC and firefly classifiers to optimize hyperparameters similarly.Table 9 provides an analysis of the hyperparameters of the different classifiers in addition to the limiting values for each.

Results and discussion
The performance metrics are evaluated in this work.Metrics such as OA, F1 Score, GDR, MCC, and ER are valued from the confusion matrix [44,52].The True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) are the four parameters that comprise the confusion matrix.The number of accurately identified abnormal segments is represented by TP, the number of accurately identified normal segments is represented by TN, and the number of normal segments that were incorrectly identified is indicated by FP.FN shows the number of incorrectly identified abnormal segments.The mathematical formulae of performance parameter metrics are as follows [53]: The overall accuracy, which determines the classification system's overall performance, is expressed as follows, The Error Rate (ER), also known as the misclassification rate, measures the number of samples that have been misclassified into both positive and negative categories, and it is expressed as follows, Good Detection Ratio (GDR) is mathematically expressed and is a crucial criterion of a detector.
The F1 Score is the cumulative average of Sensitivity and Specificity, which is determined as follows, The Matthews Correlation Coefficient (MCC) examines the relationship between the observed and predicted class's classification, and it is expressed as follows [52], MCC has a value from 0 to 1.In this work, 0 to 0.4 values indicate the wrong agreement between the observed and predicted classes of the classifier, and 0.5 to 1 values represent the perfect agreement between the observed and predicted classes of the classifier.The summarized average result evaluation for VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes of different dimensionally reduced values without CS and HSO feature selection using different classifiers are shown in Table 10.The summarized average result evaluation for VT vs. NSR, PVC vs. NSR, and ST vs. NSR cases of different dimensionally reduced values with CS feature selection using different classifiers are shown in Table 11.The summarized average result evaluation for VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes of different dimensionally reduced values with HSO feature selection using different classifiers are shown in Table 12.Summarized average result analysis for VT vs. NSR, PVC vs. NSR, and ST vs. NSR cases of different dimensionally reduced values with CS feature selection using GSO and Adam hyperparameter tuning based on different classifiers are exposed in Table 13.Summarized average result analysis for VT vs. NSR, PVC vs. NSR, and ST vs. NSR cases of different dimensionally reduced values with HSO feature selection using GSO and Adam hyperparameter tuning based on different classifiers are exposed in Table 14.
Table 10 represents the consolidated average result evaluation for VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes of different dimensionally reduced values without CS and HSO feature selection using different classifiers.Since the dimensionality reduction was carried out using the LLE technique without feature selection, the output of the classifier with an error rate range attained from 42.14 % to 46.97 % in the VT vs. NSR Case.The GMM Classifier attains high parametric average values, such as 57.86 % overall accuracy, 39.54 % F1 score, 37.57 % GDR, and 0.1394 MCC.In the PVC vs. NSR Case, the error rate range attained from 42.07 % to 46.04 %.The GMM Classifier attains high parametric average values, such as 57.93 % overall accuracy, 44.76 % F1 score, 39.77 % GDR, and 0.1491 MCC.In the ST vs. NSR case, the error rate ranged from 37.41 % to 46.24 %.The GMM Classifier attains high parametric average values, such as 62.59 % overall accuracy, 58.58 % F1 score, 49.00 % GDR, and 0.2775 MCC.Since the dimensionality reduction was carried out using the DM technique without feature selection, the output of the classifier with an error rate range attained from 43.91 % to 49 % in the VT vs. NSR Case.The Firefly classifier attains high parametric average values, such as 56.09 % overall accuracy, 36.61 % F1 score, 34.28 % GDR, and 0.0921MCC.In the PVC vs. NSR Case, the error rate range attained from 43.22 % to 46.72 %.The Firefly classifier attains high parametric average values, such as 56.78 % overall accuracy, 43.32 % F1 score, 37.56 % GDR, and 0.1245 MCC.In the ST vs. NSR case, the error rate ranged from 38.91 % to 48.23 %.The Firefly classifier attains high parametric average values, such as 61.09 % overall accuracy, 57.82 % F1 score, 45.67 % GDR, and 0.2574 MCC.Since the dimensionality reduction was carried out using the LE technique without feature selection and the output of the classifier with an error rate range attained from 44.02 % to 48.32 % in the VT vs. NSR Case.The EM Classifier attains high parametric average values, such as 55.98 % overall accuracy, 35.62 % F1 score, 34.51 % GDR, and 0.0783 MCC.In the PVC vs. NSR Case, the error rate range attained from 43.3 % to 47.85 %.The EM classifier attains high parametric average values, such as 56.70 % overall accuracy, 42.59 % F1 score, 37.77 % GDR, and 0.1152 MCC.In the ST vs. NSR case, the error rate ranged from 39.81 % to 45.38 %.The Firefly classifier attains high parametric average values, such as 60.19 % overall accuracy, 54.94 % F1 score, 45.56 % GDR, and 0.2169 MCC.
Table 11 exhibits the consolidated average result evaluation for VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes of different dimensionally reduced values with CS feature selection using different classifiers.Since the dimensionality reduction was carried out using the LLE technique with CS feature selection and the output of the classifier with error rate range attained from 23.26 % to 42.67 % in VT vs. NSR Case.The Firefly classifier attains high parametric average values, such as 76.74 % overall accuracy, 53.47 % F1 score, 71.82 % GDR, and 0.4105 MCC.In the PVC vs. NSR Case, the error rate range attained from 23.61 % to 42.37 %.The Firefly classifier attains high parametric average values, such as 76.39 % overall accuracy, 58.75 % F1 score, 71.75 % GDR, and 0.4392 MCC.In the ST vs. NSR case, the error rate ranged from 21.61 % to 41.39 %.The Firefly classifier attains high parametric average values, such as 78.39 % overall accuracy, 72.10 % F1 score, 74.90 % GDR, and 0.5512 MCC.Since the dimensionality reduction was carried out using the DM technique with CS feature selection and the output of the classifier with error rate range attained from 27.42 % to 42.64 % in VT vs. NSR Case.The Firefly classifier attains high parametric average values, such as 72.58 % overall accuracy, 50.23 % F1 score, 64.80 % GDR, and 0.3684 MCC.In the PVC vs. NSR Case, the error rate range attained from 28.03 % to 41.02 %.The Firefly classifier attains high parametric average values, such as 71.97 % overall accuracy, 54.44 % F1 score, 64.62 % GDR, and 0.3744 MCC.In the ST vs. NSR case, the error rate ranged from 29.01 % to 41.79 %.The Firefly classifier attains high parametric average values, such as 70.99 % overall accuracy, 62.53 % F1 score, 64.79 % GDR, and 0.3944 MCC.Since the dimensionality reduction was carried out using the LE technique with CS feature selection and the output of the classifier with error rate range attained from 31.21 % to 40 ) ) ) )   Results show that our work approach can correctly detect the VT vs. NSR, PVC vs. NSR, and ST vs. NSR arrhythmia classes using the GMM with Adam hyperparameter tuning classifier with 98.38 %, 98.4 8 %, and 98.92 % accuracies.We evaluate the detection performance of our approach LLE, DM and LE with & without CS and HSO Feature Selection with GMM, EM, NLR, LR, BDLC, Detrended FA, Firefly and GSO and Adam hyperparameter tuning-based classifiers for cardiac arrhythmias detection with that of existing approaches in the references that are various dimensionality reduction techniques and classifiers using MIT-BIH Arrhythmia database, we identified eight existing ECG beat classification and detection approaches to compare with our work approach.Table 16 summarizes the number of classified cardiac arrhythmia types as well as the overall accuracies of our work approach and eight existing approaches.

Conclusion
Cardiac Vascular Arrhythmias (VT, PVC and ST) are irregular cardiac rhythms.These types of cardiac arrhythmias are very dangerous to human health.Cardiac arrest, Chest pain, Fluttering, and Myocardial infarction are all symptoms of cardiac vascular diseases.In this work, ECG signals obtained from the MIT-BIH database are analyzed for Detection of VT, PVC, ST and Normal using different classifiers.The results show that the performance of the classifiers with hyperparameter tuning approaches is better than with and without CS and HSO feature selection.The higher accuracy of 98.38 % is achieved for the LLE dimensionality reduction with HSO feature selection in the GMM classifier with Adam hyperparameter tuning, as in the case of VT vs. NSR.The Adam hyperparameter tuning-based GMM classifier with LLE dimensionality reduction with HSO feature selection is maintained at 98.48 % accuracy as in detection for PVC vs. NSR.In the case of ST vs. NSR detection, an accuracy of 98.92 % is exhibited by the GMM classifier with Adam hyperparameter tuning with LLE dimensionality reduction and HSO feature selection.Adam's hyperparameter tuning-based GMM Classifier, which has 98.92 % accuracy in detecting ST vs. NSR cardiac disease, outperforms all other classifiers.Deep learning methods and the Convolution Neural Network (CNN) will be the future endeavours of this work.
i) Various CVDs-based ECG signals are reduced in dimensions via LLE, DM, and LE.ii) The number of dimensions of different CVDs-based ECG signals is further reduced through the feature selection process of CS and HSO algorithms.iii) Then, the dimensionally reduced values and the CS and HSO feature-selection values are given to the different classifiers like GMM, EM, NLR, LR, BDLC, Detrended FA and Firefly to detect ventricular arrhythmias from ECG signals.iv) Hyperparameter tuning strategies are also used for the GMM, EM, BDLC, and Firefly classification algorithms.In this study, the GSO and Adam approaches are used to determine the optimized hyperparameter results for each classification algorithm.v) Finally, the classifier outcome is examined and validated with and without feature selection, as well as with hyperparameter tuning.Here, OA, F1 score, GDR, MCC, and error rate are the performance metrics of the several classifiers.

Fig. 4 Fig. 4 .
Fig. 4 represents the normal plot evaluation of LE features for VT, PVC, ST and Normal cases.Fig. 4 exhibits the overlapping nature of the class features among various classes.Therefore, to attain good classification accuracy, the selection of a classifier is more

Fig. 5 .
Fig. 5. Flowchart of the GSO hyperparameter methodology for the firefly classification algorithm.

Fig. 7 .Fig. 8 .
Fig. 7. Performance of average accuracy for different dimensionality reduction techniques with and without CS and HSO feature selection of different classifiers for the VT vs. NSR Case.

Fig. 9 .
Fig. 9. Performance of average accuracy for different dimensionality reduction techniques with and without CS and HSO feature selection of different classifiers for the ST vs. NSR Case.

Fig. 10 .
Fig. 10.Performance of average accuracy for different dimensionality reduction techniques for CS feature selection with different classifiers based GSO and Adam hyperparameter tuning for VT vs. NSR, PVC vs. NSR and ST vs. NSR cases.

Fig. 11 .
Fig. 11.Performance of average accuracy for different dimensionality reduction techniques for HSO feature selection with different classifiers based GSO and Adam hyperparameter tuning for VT vs. NSR, PVC vs. NSR and ST vs. NSR cases.

Table 1
Details of MIT-BIH database. .Local Linear Embedding (LLE), Diffusion Map (DM) and Laplacian Eigen (LE) are used to reduce the dimension of the ECG data.After the dimensionality reduction VT signal consists of 2167 epochs, PVC signal consists of 2889 epochs, ST change signal consists of 4200 epochs and NSR signal consists of 7088 epochs.Then feature selection is initiated using Cuckoo Search (CS) technique.After CS feature selection, VT signal consists of 333 epochs, PVC signal consists of 444 epochs, ST change signal consists of 778 epochs and NSR signal consists of 1406 epochs.The dimensionality-reduced ECG samples with and without CS feature selection are given as input to the non-linear classifiers to detect possible ventricular arrhythmias.The following section explains three different dimensionality techniques. samples

Table 2
Average statistical parameters at different dimensionality reduction techniques without CS and HSO feature selection for VT, PVC, ST and normal cases.

Table 3
Average statistical parameters at different dimensionality reduction techniques with CS and HSO feature selection for VT, PVC, ST and normal cases.
VT case without CS and HSO feature selection, S l ranges from 1 to 24 with M equal to 2167 epochs.In PVC case without CS and HSO feature selection, S l ranges from 1 to 32 with M equal to 2889 epochs.In ST case without CS and HSO feature selection, S l ranges from 1 to 56, with M equal to 4200 epochs.In NSR case without CS and HSO feature selection, S l ranges from 1 to 36, with M equal to 7088 epochs.In VT case with CS and HSO feature selection; S l ranges from 1 to 24, with M equal to 333 epochs.In PVC case with CS and HSO feature selection; S l ranges from 1 to 32, with M equal to 444 epochs.In ST case with CS and HSO feature selection, S l ranges from 1 to 56, with M equal to 778 epochs.In NSR case with CS and HSO feature selection; S l ranges from 1 to 36, with M equal to 1406 epochs.Table 4 displays the average MSE values and confusion matrix for the VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes using different classifiers without CS and HSO feature selection for various dimensionality reduction techniques.Table 5 exhibits the average MSE values and confusion matrix for VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes for different classifiers with CS feature selection in different dimensionality reduction techniques.Table 6 exhibits the average MSE values and confusion matrix for VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes for different classifiers with HSO feature selection in different dimensionality reduction techniques.Average MSE values and confusion matrix for VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes with GSO and Adam hyperparameter tuning based on different classifiers with CS feature selection for different dimensionality reduction techniques are shown in Table 7. Table 8 exhibits the Average MSE values and confusion matrix for VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes with GSO and Adam hyperparameter tuning based on different classifiers with HSO feature selection for different dimensionality reduction techniques.

Table 4
Average MSE values and confusion matrix for VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes with different classifiers without CS and HSO feature selection for different dimensionality reduction techniques.

Table 5
Average MSE values and confusion matrix for VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes for different classifiers with CS feature selection in different dimensionality reduction techniques.

Table 6
Average MSE values and confusion matrix for VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes for different classifiers with HSO feature selection in different dimensionality reduction techniques.

Table 7
Average MSE values and confusion matrix for VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes with GSO and Adam hyperparameter tuning based different classifiers with CS feature selection for different dimensionality reduction techniques.

Table 8
Average MSE values and confusion matrix for VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes with GSO and Adam hyperparameter tuning based different classifiers with HSO feature selection for different dimensionality reduction techniques.

Table 9
Different classifiers' hyperparameters and their ranges.
.98 % in VT vs. NSR Case.The Firefly classifier attains high parametric average values, such as 68.79 % overall accuracy, 41.34 % F1 score, 59.44 % GDR, and 0.2400 MCC.In the PVC vs. NSR Case, the error rate range attained from 31.86 % to 40.99 %.The Firefly classifier attains high parametric average values, such as 68.14 % overall accuracy, 46.44 % F1 score, 59.33 % GDR, and 0.2592 MCC.In the ST vs. NSR case, the error rate ranged from 28.1 % to 40.45 %.The Firefly Classifier attains high parametric average values, such as 71.90 % overall accuracy, 64.82 % F1 score, 65.58 % GDR, and 0.4256 MCC.Table 12 exhibits the consolidated average result evaluation for VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes of different dimensionally reduced values with HSO feature selection using different classifiers.Since the dimensionality reduction was carried out using the LLE technique with HSO feature selection and the output of the classifier with error rate range attained from 13.6 % to 37.04 % in VT vs. NSR Case.The GMM classifier attains high parametric average values, such as 86.40 % overall accuracy, 72.30 % F1 score, 84.50 % GDR, and 0.6666 MCC.In the PVC vs. NSR Case, the error rate range attained from 12.35 % to 37.48 %.The GMM classifier attains high parametric average values, such as 87.65 % overall accuracy, 78.93 % F1 score, 86.05 % GDR, and 0.7296 MCC.In the ST vs. NSR case, the error rate ranged from 10.46 % to 31.58 %.The GMM classifier attains high parametric average values, such as 89.54 % overall accuracy, 86.96 % F1 score, 88.41 % GDR, and 0.7979 MCC.Since the dimensionality reduction was carried out using the DM technique with HSO feature selection and the output of the classifier with error rate range attained from 18.22 % to 29.09 % in VT vs. NSR Case.The GMM classifier attains high parametric average values, such as 81.78 % overall accuracy, 65.58 % F1 score, 78.21 % G.S. Manivannan et al.

Table 10
Summarized average result analysis for VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes of different dimensionally reduced values without CS and HSO feature selection using different classifiers.

Table 11
Summarized average result analysis for VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes of different dimensionally reduced values with CS feature selection using different classifiers.

Table 14
Summarized average result analysis for VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes of different dimensionally reduced values with HSO feature selection using different classifiers for Adam hyperparameter tuning.

Table 15
Computational complexity for VT vs. NSR, PVC vs. NSR, and ST vs. NSR cases of different dimensionally reduced ideals with CS and HSO feature selection using different classifiers for GSO and Adam hyperparameter tuning approaches.
GDR, and 0.5850 MCC.In the PVC vs. NSR Case, the error rate range attained from 15.95 % to 28.97 %.The EM classifier attains high parametric average values, such as 84.05 % overall accuracy, 73.83 % F1 score, 81.36 % GDR, and 0.6613 MCC.In the ST vs. NSR case, the error rate ranged from 13.72 % to 29.01 %.The EM classifier attains high parametric average values, such as 86.28 % overall accuracy, 83.27 % F1 score, 84.36 % GDR, and 0.7377 MCC.Since the dimensionality reduction was carried out using the LE technique with HSO feature selection and the output of the classifier with error rate range attained from 20.83 % to 33.72 % in VT vs. NSR Case.
3) The GMM classifier attains high parametric average values, such as 79.17 % overall accuracy, 59.28 % F1 score, 74.95 % GDR, and 0.4921 MCC.In the PVC vs. NSR Case, the error rate range attained from 17.33 % to 34.68 %.The GMM classifier attains high parametric average values, such as 82.67 % overall accuracy, 72.20 % F1 score, 79.41 % GDR, and 0.6405 MCC.In the ST vs. NSR case, the error rate ranged from 14.89 % to 33.83 %.The GMM Classifier attains high parametric average values, such as 85.11 % overall accuracy, 82.09 % F1 score, 82.80 % GDR, and 0.7192 MCC.Tables13 and 14indicate the summary of the average result analysis for VT vs. NSR, PVC vs. NSR, and ST vs. NSR classes of

Table 16
Summary of existing works for cardiac arrhythmias detection from MIT-BIH Database.